Modeling the Statistical Idiosyncrasy of Multiword Expressions
نویسندگان
چکیده
The focus of this work is statistical idiosyncrasy (or collocational weight) as a discriminant property of multiword expressions. We formalize and model this property, compile a 2-class dataset of MWE and non-MWE examples, and evaluate our models on this dataset. We present a possible empirical implementation of collocational weight and study its effects on identification and extraction of MWEs. Our models prove to be more effective than baselines in identifying noun/adjective-noun MWEs.
منابع مشابه
Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures
We identify several classes of multiword expressions that each require a different encoding in a (computational) lexicon, as well as a different treatment within a computational system. We examine linguistic properties pertaining to the degree of semantic idiosyncrasy of these classes of expressions. Accordingly, we propose statistical measures to quantify each property, and use the measures to...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملA Distributional Account of the Semantics of Multiword Expressions
The lexical status of multiword expressions (MWEs), such as make a decision and shoot the breeze, has long been a matter of debate. Although MWEs behave much like phrases on the surface, it has been argued that they should be treated like words because their components together form a single unit of meaning. However, MWEs are not a homogeneous lexical category, but rather can have distinct sema...
متن کاملLexical idiosyncrasy in MWE extraction
A wide scale of different NLP methods have been investigated for the extraction of Multiword Expressions from large corpora. While a good deal of recent research has been focusing on the development of reliable means to delineate different subclasses of MWEs with respect to the degree of their compositionality (Baldwin et al., 2003; McCarthy et al., 2003), it has been generally accepted that fo...
متن کاملImproving Statistical Machine Translation Using Domain Bilingual Multiword Expressions
Multiword expressions (MWEs) have been proved useful for many natural language processing tasks. However, how to use them to improve performance of statistical machine translation (SMT) is not well studied. This paper presents a simple yet effective strategy to extract domain bilingual multiword expressions. In addition, we implement three methods to integrate bilingual MWEs to Moses, the state...
متن کامل